This lesson takes your multi-agent system from development to production. You'll learn how to evaluate your agents to ensure reliable behavior, give your agents persistent memory using Vertex AI memory bank, and deploy to a scalable cloud environment using Vertex AI's agent engine. We'll also briefly discuss the live bidirectional agent architecture. Let's make the magic happen. Welcome to this lesson. In our journey throughout this course, we have built a sophisticated multi-agent podcast system which can take in voice input from the user and return a podcast episode. And in our final lesson, we are going to cover the important topic of productionizing the agent we just built. There's a crucial gap between our local development setup and a production ready system. And production AI agents face unique challenges that development environments don't expose. And in this lesson, we'll explore six fundamental pillars of production AI systems. We'll walk through these by first looking into the live BiDirectional architecture in ADK. something that you would use in a production environment as you build more complex agents and move out of the ADK web. And then we'll give our agent Persistent Memory that it needs for a production system. Now, once both these are done, our agent is now in a place where we can evaluate its performance. And then comes the standard practice of deploying our apps to scale, adopting agent security practices, and making sure that we have agent observability so that our agents are not a black box. And as we walk through these practices, we'll also see the different capabilities ADK and Google Cloud provides to address them. Let's start with the Live BiDirectional Streaming. Using real-time conversational interfaces is a huge unlock. As humans, we've evolved for thousands of years to talk to each other, and now we can translate that and choose to speak with our agents instead of typing. And for it to actually feel natural, you will need extremely capable models and low latency streaming connections to their APIs. And this requires bidirectional streaming, audio processing, and something that actually feels instantaneous to feel it human-like. For natural conversations, it requires more than fast response. You need emotional understanding, pauses, and the ability to interrupt and be interrupted naturally. The Gemini live bidirectional API enables true conversational AI through Websocket connections that maintain persistent real-time communication. The system handles complex Audio Processing pipelines, Multilingual Voice Synthesis with multiple speaker options, and Emotion-Aware responses that adapt to the user and tone. And when you put all of this together, What you get is a conversational experience that feels more natural and human-like. And ADK provides a very simple interface to plug in this Gemini live API. In this course, we ran all of our agents with ADK web, which simplified and hid most of the complexities involved in creating a live bidirectional agent. We were still interacting with the Gemini live model and got our voice responses as output in real time using WebSockets, but as you scale your agent further, you might want to integrate it with your existing systems and client and at that point you might want to have final control over how you implement the live BiDirectional streaming. And let's have a very quick look at how you might do this with ADK. ADK abstracts the core agent logic from the transport layer through two fundamental primitives: a live_request_queue to send data to the agent and a live_events stream to receive responses from the agent. This design means your agent logic is completely independent of whether you use WebSocket or Server-Sent Events or other protocols. The live_request_queue handles different message types like text, real-time audio blobs, and activity signals for natural conversational flow. Meanwhile, the live_events stream yields real-time events including agent responses, turn completion signals, interrupts, and streaming tool outputs. And then you orchestrate both of this using a runner.run live method that ADK provides. There are links for you in the resource section to learn more about this live bidirectional streaming architecture and how to implement it. And next up, is to provide your agents with Persistent Memory and we actually briefly discussed this in lesson two. Agents seem much more intelligent if they can actually remember conversations across sessions, learn user preferences and maintain context even when systems restart. And agentic memory is actually of two types. Volatile which disappears on session starts and Persistent Memory which is the long-term memory backed up somewhere. And that is exactly the kind of memory that we're talking about here. For production, agents need to build understanding over time. recognize patterns in user behavior and provide increasingly personalized experiences. And ADK gives you an interface for everything, memory included, and you can add a provider for any memory service like Vertex AI's Memory Bank, mem0.ai, or any other DBs with your agent. Memory Bank is Google Cloud's managed service that transforms raw conversation history into intelligent, searchable knowledge. Unlike simple session storage, Memory Bank actually uses an LLM-powered processing to extract meaningful information. from your session data and then consolidate it with existing knowledge and provide semantic search capabilities. So the next time you ask your agent a question from one of your previous sessions, it might actually be able to use this long-term memory and fetch the context and give you the exact answer. And this actually enables agents not only to remember what you just said, but what you meant, what you prefer, and how you like to work. The service also handles complex challenges of memory consolidation, determining what's worth remembering and how it connects to existing knowledge. And with this, your agents evolve from fixed systems to intelligent, personalized assistants that grow smarter with every interaction. And after implementing both the bidirectional live streaming and the persistent memory, we're actually in a good spot to perform agent evaluation at this point. And fun fact, when I started learning about AI agents, I used to think agent eval was something like unit testing the agent's behavior, but turns out I'm completely wrong. Well, if you're writing traditional unit tests for your agent, you're trying to measure dynamic system with a static ruler. The very probabilistic nature of LLMs needs our change in approach, and that is exactly where we go from Verifying correctness to Assessing Quality. And both these mean very different things. Verifying correctness can be done through unit tests, but which do not translate to the world of AI agents because they're non-deterministic. But instead, we need to assess how helpful, harmless, and reliable your agent is. And that is the core of evaluating an agent. Broadly speaking, there are two categories of quality checks that we can perform on an agent. We can look into the agent's trajectory, verify if it took the right steps, if it called the right tools, use the right sub agent to get something done. And we can also look into the final response from the agent to check if the answer was any good and if it matched our expectations. ADK provides again with a comprehensive evaluation framework and built-in metrics like tool trajectory scoring, response matching, and safety evaluation. You can evaluate agents through the web UI if you want to debug your agents, or even programmatically via pytest for CICD integration or also through the CLI with its adk eval command. The framework supports both test files for unit testing and comprehensive evaluation sets for integration testing. There's also another option, Vertex AI evaluation service, which integrates seamlessly with ADK, providing cloud-based evaluation capabilities including advanced NLP metrics for Coherence and Safety assessment. This service uses pre-built metrics like Coherence for response quality scoring and Safety for harmlessness evaluation. Processing your agent data in the cloud and returning results with a configurable pass or fail thresholds. And this makes our decision much simpler when we have a pass or a fail. With this integration, you can combine local ADK metrics for trajectory analysis with cloud-based Vertex AI metrics for sophisticated language understanding assessment. And finally, once we're done with agent eval, it's now time to deploy our agents to production. A single agent running locally can't handle thousands of concurrent users. Production applications of all types need automatic scaling, load distribution, and infrastructure management without manual intervention. Actually scaling applications or AI agents isn't actually about handling more requests, but it's much more. It is about maintaining performance, managing your costs and ensuring reliability as the demand fluctuates. Google Cloud's Vertex AI Agent Engine provides turnkey serverless deployment specifically for agentic AI workloads. Unlike generic compute platforms, it understands agent life cycles, manages model loading efficiently, and provides built-in integrations with AI services. Actually, Agent Engine isn't just only a Runtime. It has providers for Sessions and Example Store service, which is basically a RAG for multi-shot examples. It also has memory bank and more. The platform auto-scales based on AI specific needs and resources, which are configurable by you, ensuring low latency and high availability for your users. ADK is tightly integrated with agent engine and you can deploy an agent to agent engine with just one CLI command, ADK deploy. And at the end of the day, agents are just another type of application. You can containerize agents and run them anywhere that you actually run your other applications. If you prefer to build all of the agentic services yourself, you can use Cloud Run, which is our serverless runtime built for scale, reliability, and flexibility. Or if you want full control or the power of Kubernetes, GKE is also a good option. AI agents present very unique challenges when it comes to security. They process natural language, make autonomous decisions, and may actually have access to sensitive information and powerful tools. And this makes robust security mandatory. Like robust authentication, content filtering, input validation, and protection against prompt injection and misuse. Well, let's walk through each of those. Security is actually a defense-in-depth game. And agentic security requires multi-layered protection. There are two ways in which your agents can authenticate. Either using the Agent's credentials or using the user's credentials. The first approach actually works when every user who's using the agent has the same privileges. But if that's not the case, then the agent actually needs to borrow the user's credentials to get the job done. And in this scenario, you also need to implement authentication that goes beyond simple API keys to include OAuth flows, service account credentials and fine-grained identity control based on whether agents act with their own permissions or the user permissions or a combination of both. Authorization to resources is also another common concern. Limiting the scope of agents, tools, APIs, and resources that any principal can access to. And this is especially important as you scale up and need to govern a huge fleet. Well, this brings us to another important topic of how important it is to perform user input sanitization, as it helps your agent be more secure against prompt injection, content manipulation, and any attempts to make agents perform unauthorized actions. Content filtering actually combines built-in safety measures from the LLM model itself with configurable harm category thresholds and policy-based eval. ADK lets you define before callbacks to sanitize your inputs. And you can do it by implementing custom checks or using LLMs to decide if a user's prompt is actually safe to pass. Or you can also use special services like Model Armor, which is a Google Cloud service designed to enhance security and safety of your AI applications. It works by proactively screening LLM prompts and responses, protecting against various risks and ensuring responsible AI practices. And the same after callbacks can safeguard against generated output from the agents that you do not want to expose. Maybe let's say the agent generates some information about a competitor that you don't want it to expose. And since you are in control of this callback, you can actually handle these exceptions and then maybe simply call the model again to get a new output. And this is exactly what people mean when they talk about guardrails. And ADK gives you the flexibility and control and the tools to do it. There are also plenty of other safety measures that you can take like sandbox code execution environments, setting up a Virtual Private Cloud and adopting comprehensive safety evaluation frameworks to ensure your agents operate within defined boundaries while also maintaining functionality. And this brings us to the final topic of observability. Agents are general purpose systems which are capable of many things. You need complete visibility into what your agents are actually doing, how they are performing, and whether they are actually operating safely. And this brings us to the final topic of agent observability. Agents are actually general purpose systems which are capable of doing many things. But you need complete visibility into what your agents are actually doing, how they're performing and whether they're operating safely. When agent evaluation indicates you have a problem with the quality of your agent, observability is actually how you will debug them. So having visibility is super crucial. ADK provides comprehensive observability built on top of open telemetry standards. This includes end-to-end tracing from user input, each internal step, and all the way to the final response. It also includes real-time monitoring of agent interactions and detailed performance analytics, including token usage, latency metrics, and cost tracking. The framework integrates with leading observability platforms like Weave for real-time visualization, Arize for enterprise monitoring, Phoenix for self-hosted observability, and AgentOps for specialized agent analytics. Each provides different capabilities from session replays to custom evaluators to automated alerts. You can also use Google Cloud Trace, which aggregates the same traces and provides waterfall views of complex agent interactions across any distributed system. Cloud Trace handles large multi-modal payloads and is actually helping to shape the OpenTelemetry standards for LLMs and agents. This comprehensive observability ensures you can monitor, debug, and optimize your agents in production environments. Congratulations. You've completed the journey from understanding basic agents to architecting production-ready AI systems. And you're ready to build the future of conversational AI. Check out the various exercises across all of the different lessons and the resources section for further reading.